稀疏编码已在视觉皮层的模型中纳入其计算优势和与生物学的连接。但是,稀疏程度如何在视觉任务上有助于表现,并不充分了解。在这项工作中,稀疏的编码已集成到现有的分层V2型号(Hosoya和Hyv \“Arinen,2015),但更换其独立的分量分析(ICA),具有明确的稀疏编码,其中可以控制稀疏程度。在训练之后,稀疏编码基础函数具有更高程度的稀疏性类似于定性不同的结构,例如曲线和角落。使用图像分类任务进行评估模型的贡献,特别是与中级视觉相关的任务,包括图 - 地面分类,纹理分类和两条线刺激之间的角度预测。此外,与v2(Freman等,2013)中报道的纹理敏感度量相比,评估模型(Freeman等,2013)和删除区域推理任务。该实验结果表明,同时在分类图像中比ICA差的稀疏编码差,只能稀疏编码能够更好地匹配纹理森通过提高稀疏编码的稀疏度,v2和推断删除图像区域的定位等级。在较大删除的图像区域上允许推断推断出更高程度的稀疏性。这里描述允许在稀疏编码中进行这种推理能力的机制。
translated by 谷歌翻译
In intensively managed forests in Europe, where forests are divided into stands of small size and may show heterogeneity within stands, a high spatial resolution (10 - 20 meters) is arguably needed to capture the differences in canopy height. In this work, we developed a deep learning model based on multi-stream remote sensing measurements to create a high-resolution canopy height map over the "Landes de Gascogne" forest in France, a large maritime pine plantation of 13,000 km$^2$ with flat terrain and intensive management. This area is characterized by even-aged and mono-specific stands, of a typical length of a few hundred meters, harvested every 35 to 50 years. Our deep learning U-Net model uses multi-band images from Sentinel-1 and Sentinel-2 with composite time averages as input to predict tree height derived from GEDI waveforms. The evaluation is performed with external validation data from forest inventory plots and a stereo 3D reconstruction model based on Skysat imagery available at specific locations. We trained seven different U-net models based on a combination of Sentinel-1 and Sentinel-2 bands to evaluate the importance of each instrument in the dominant height retrieval. The model outputs allow us to generate a 10 m resolution canopy height map of the whole "Landes de Gascogne" forest area for 2020 with a mean absolute error of 2.02 m on the Test dataset. The best predictions were obtained using all available satellite layers from Sentinel-1 and Sentinel-2 but using only one satellite source also provided good predictions. For all validation datasets in coniferous forests, our model showed better metrics than previous canopy height models available in the same region.
translated by 谷歌翻译
A core process in human cognition is analogical mapping: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to what?). Unlike previous work on visual analogy that focused on simple image transformations, we tackle complex analogies requiring understanding of scenes. We leverage situation recognition annotations and the CLIP model to generate a large set of 500k candidate analogies. Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label ~80% of the time (chance level 25%). Furthermore, we use human annotations to create a gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly (~86%), but struggle with carefully chosen distractors (~53%, compared to 90% human accuracy). We hope our dataset will encourage the development of new analogy-making models. Website: https://vasr-dataset.github.io/
translated by 谷歌翻译
The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this mechanism, while powerful and elegant, is not as important as typically thought for pretrained language models. We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones -- the average attention weights over multiple inputs. We use PAPA to analyze several established pretrained Transformers on six downstream tasks. We find that without any input-dependent attention, all models achieve competitive performance -- an average relative drop of only 8% from the probing baseline. Further, little or no performance drop is observed when replacing half of the input-dependent attention matrices with constant (input-independent) ones. Interestingly, we show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success. Our results motivate research on simpler alternatives to input-dependent attention, as well as on methods for better utilization of this mechanism in the Transformer architecture.
translated by 谷歌翻译
超参数优化是识别给定的机器学习模型的适当的超参数配置的过程。对于较小的数据集,可以进行详尽的搜索;但是,当数据大小和模型复杂性增加时,配置评估的数量成为主要计算瓶颈。解决此类问题的有希望的范式是基于替代物的优化。此范式基础的主要思想考虑了超参数空间与输出(目标)空间之间关系的增量更新模型;该模型的数据是通过评估主学习引擎来获得的,例如基于计算机的模型。通过学习近似超参数目标关系,可以使用替代(机器学习)模型来评分大量的超参数配置,并探索除直接机器学习引擎评估的配置空间的一部分。通常,在优化初始化之前选择替代物,并且在搜索过程中保持不变。我们调查了在优化本身期间代孕物质的动态切换是否是选择最合适的基于计算机的大规模在线推荐的最合适的分解模型的实用相关性的明智概念。我们对包含数亿个实例的数据集进行了基准测试,以针对既定基线,例如随机森林和高斯基于过程的替代物。结果表明,替代转换可以提供良好的性能,同时考虑学习引擎评估较少。
translated by 谷歌翻译
概念诱导是基于正式的逻辑推理在描述逻辑上的,已在本体工程中使用,以从基本数据(ABOX)图创建本体(Tbox)公理。在本文中,我们表明它也可以用来解释数据差异,例如在可解释的AI(XAI)的背景下,我们表明它实际上可以以对人类观察者有意义的方式进行。我们的方法利用了从Wikipedia类别层次结构策划的大型层次结构,作为背景知识。
translated by 谷歌翻译
基础模型(FMS)已证明了前所未有的功能,包括零拍学习,高保真数据合成和范围内的概括。但是,正如我们在本文中所显示的那样,FMS在专家任务上的开箱即用表现较差(例如,从语言查询中检索汽车手册技术插图),数据是看不见的,或者属于长尾的数据用于FM预训练的大型数据集的数据分布的一部分。这强调了在此类专家任务上明确评估和芬太尼FMS的必要性,这可以说是在实际现实世界中最重要的任务。在本文中,我们提出了围绕教授FMS了解技术文档的任务,通过学习将其图形插图与相应的语言描述相匹配的任务围绕着了解技术文档的任务。我们的FETA基准重点是公共汽车手册和销售目录手册中的文本对图像和图像到文本检索。 FETA配备了完全自动注释提取的程序(接受后将发布代码),从而使Feta轻松扩展到将来更多的文档类型和应用域。我们的自动注释导致自动性能指标显示,该指标与在人类策划注释中计算的指标一致(也发布)。我们提供多个基线和对FETA的流行FM的分析,从而导致一些有趣的发现,我们认为这对FM社区非常有价值,为现实世界中FMS应用于当前被标准基准的“忽视”的实践专家任务铺平了道路。在常见对象上。
translated by 谷歌翻译
本文提出了一种新型的逆运动学(IK)索引机器人系统的求解器,用于路径计划。IK是机器人操纵的传统但必不可少的问题。最近,已经提出了数据驱动的方法来快速解决IK进行路径计划。这些方法可以通过GPU的优势立即处理大量的IK请求。但是,准确性仍然很低,并且该模型需要大量的培训时间。因此,我们提出了一个IK求解器,该求解器通过利用神经ODE的连续隐藏动力学来提高准确性和记忆效率。使用多个机器人比较性能。
translated by 谷歌翻译
从有限的资源中获得最大收益可以进步自然语言处理(NLP)研究和实践,同时保守资源。这些资源可能是数据,时间,存储或能源。NLP的最新工作从缩放率产生了有趣的结果。但是,仅使用比例来改善结果意味着资源消耗也会扩展。这种关系激发了对有效方法的研究,这些方法需要更少的资源才能获得相似的结果。这项调查涉及NLP效率的方法和发现,旨在指导该领域的新研究人员并激发新方法的发展。
translated by 谷歌翻译
虽然视觉和语言模型在视觉问题回答等任务上表现良好,但在基本的人类常识性推理技能方面,它们会挣扎。在这项工作中,我们介绍了Winogavil:在线游戏,以收集视觉和语言协会(例如,狼人到满月),用作评估最先进模型的动态基准。受欢迎的纸牌游戏代号的启发,Spymaster提供了与几个视觉候选者相关的文本提示,另一个玩家必须识别它们。人类玩家因创建对竞争对手AI模型而具有挑战性的联想而获得了回报,但仍然可以由其他人类玩家解决。我们使用游戏来收集3.5k实例,发现它们对人类的直观(> 90%的Jaccard索引),但对最先进的AI模型充满挑战,其中最佳模型(Vilt)的得分为52% ,成功的位置在视觉上是显着的。我们的分析以及我们从玩家那里收集的反馈表明,收集的关联需要多种推理技能,包括一般知识,常识,抽象等。我们发布数据集,代码和交互式游戏,旨在允许未来的数据收集,可用于开发具有更好关联能力的模型。
translated by 谷歌翻译